Data in Brief
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Data in Brief's content profile, based on 13 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Nikitin, A. G.; Renson, V.; Ivanova, S.; Neff, N. C.; Straioto, H.; Svyryd, S.
Show abstract
Five millennia ago, nomadic people from the North Pontic steppe left a profound impact on the course of Eurasian prehistory. However, little is known about their mobility patterns within their home region. To address this knowledge gap, we conducted a survey of the strontium isotope landscape of people interred in the 4th-3rd millennium BCE burial mounds (kurgans) of the western part of the North Pontic steppe. By analyzing the strontium signature in human bone and dentin, we established strontium baseline values for the region. We subsequently correlated enamel strontium ratios from 25 selected individuals with the baseline obtained and with published strontium data across the North Pontic steppe. Enamel strontium ratios show that some individuals interred in the northwest North Pontic fall within the regional baseline range, whereas others overlap with values reported for the eastern North Pontic steppe. In conjunction with carbon ({delta}13C) and nitrogen ({delta}15N) stable isotope data, we further determined that some individuals interred in the western Pontic steppe either spent the later part of life in the west Caspian steppe or were affected by physiological stress during lifetime. By integrating our data with published isotopic datasets, we produced a first baseline heatmap of the North Pontic steppe for the c. 4000-2000 BCE chronological period.
Bystrom, C.; Douglass, K.; Gupta, M.
Show abstract
Background: Mucopolysaccharidosis type IIIA (MPS IIIA; Sanfilippo syndrome) is a fatal neurodegenerative lysosomal storage disorder caused by impaired degradation of heparan sulfate (HS). Despite rapid advances in gene and enzyme therapies, there remains a critical need for an analytically validated, quantitative biomarker that accurately reflects central nervous system (CNS) substrate burden. Such biomarker would be a valuable tool in assessing disease progression and monitoring therapeutic efficacy. Objective: This study describes the method development, fit for purpose validation, and preliminary clinical application of a quantitative liquid chromatography-mass spectrometry (LC-MS/MS) assay for the HS-derived disaccharide N-sulfoglucosamine-glucuronic acid (GlcNS-GlcUA) in human cerebrospinal fluid (CSF), a critical biomarker for diagnosis, disease monitoring, and regulatory evaluation of emerging MPS IIIA therapies. Methods: A structurally defined GlcNS-GlcUA reference standard and its [13C6]-labeled internal standard were used in a derivatization and detection workflow employing 1-phenyl-3-methyl-5-pyrazolone labeling, and LC-MS/MS. Results: The method exhibited acceptable linearity across 0.005-0.500 nmol/mL (r[≥]0.9976), with intra- and inter-assay imprecision [≤]3.5%CV and accuracy within 95%-110% of nominal concentrations. No matrix or hemolysis interference or carryover was observed, and the analyte remained stable during freeze-thaw storage conditions. Application of the method to 12 CSF samples from patients with MPS IIIA demonstrated quantifiable GlcNS-GlcUA levels ranging from 0.0054 to 0.106 nmol/mL, confirming suitability for clinical and regulatory use. Comparison of the MPS IIIA sample results between the development laboratory and the contract research organization laboratory support robust inter-lab assay transfer. Conclusions: This validated LC-MS/MS method establishes a regulatory-grade quantitative assay for measurement of CSF HS in MPS IIIA. Its high analytical sensitivity and reproducibility enable reliable assessment of CNS substrate reduction and pharmacodynamic response, supporting biomarker-driven therapeutic development and accelerated approval pathways for neuronopathic mucopolysaccharidoses.
Asmundsdottir, R. D.; Troche, G.; Olsen, J. V.; Martinez de Pinillos, M.; Martinon-Torres, M.; Schrader, S.; Welker, F.
Show abstract
Dental enamel, the hardest mineralised tissue in the human body, has proven to be an excellent source of ancient proteins, which have been found to survive within dental enamel for at least twenty million years. In archaeological and palaeontological contexts, the enamel proteome is generally considered to be rather small, consisting of about twelve proteins, most of which are unique to enamel. During amelogenesis these proteins undergo in vivo digestion by matrix metalloproteinase 20 (MMP20) and kallikrein 4 (KLK4) as well as serine phosphorylation by family with sequence similarity member 20-C (FAM20C) that alter their characteristics. Gaining knowledge of the previously understudied influence of amelogenesis on the archaeological human dental enamel proteome could benefit various palaeoproteomic analysis, especially in an human evolutionary context. Here we present archaeological dental enamel proteomes and explore protein cleavage patterns and sequence coverage to estimate the effects of in vivo digestion, as well as explore phosphorylation patterns. Additionally, we present a new marker based on phosphorylation to estimate genetic sex.
Leite, A.; Welker, F.; Godinho, R. M.; Gillis, R. E.; Islas, V. V.; Fagernas, Z.
Show abstract
Ancient human dental calculus is one of the richest archives of archaeological biomolecular information, providing direct evidence of diet, oral health, and the oral microbiome. Proteomic analyses of this biological matrix have so far focused mainly on oral microbes and dietary proteins, with milk proteins such as beta-lactoglobulin (BLG) providing the largest corpus of proteomic evidence. Despite the close relation between the various stages of dental calculus formation and mineralization with the dental enamel surface, proteins from the dental enamel matrix have not previously been reported outside of dental enamel tissue. Here we reanalysed 498 ancient dental calculus proteomes from 14 published studies (n=434 individuals) reporting the presence of BLG, spanning from the Neolithic to the Victorian Era and applying different protein extraction protocols (FASP, GASP, SP3 and in-solution digestion). Dental enamel matrix proteins were identified in ten studies (n=37 individuals), with amelogenin being the most frequently detected. Enamel peptides occurred more often in studies that applied SP3, although amelogenin was successfully identified through both SP3 and FASP. Structural proteins, including enamelin, ameloblastin, and MMP20, were also identified. The detection of AMELX and AMELY peptide sequences provided new insights into cases where the sex was previously undetermined. These findings establish dental enamel proteins as a new category of biomolecules detected in dental calculus, broadening its application beyond diet and microbiome studies to possible sex estimation. HighlightsO_LIDental calculus entraps oral microbes along with endogenous and exogenous particles during formation and mineralization C_LIO_LIWe conduct reanalysis of 14 published ancient dental calculus studies (n = 434 individuals) spanning the Neolithic to Victorian Era C_LIO_LIDental enamel proteins AMELX, AMELY, AMBN, COL17A1, ENAM and MMP20 are identified in ancient human dental calculus C_LIO_LIAmelogenin was the most frequently detected enamel protein C_LIO_LIWe expand dental calculus palaeoproteomics beyond diet and oral microbiome to potentially include sex estimation C_LI
Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.
Show abstract
Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.
Islam, M. N.; Khan, S. A.; Lanszki, Z.; Abraham, A.; Akter, S.; Bhuyan, A. A. M.; Zana, B.; Islam, M. S.; Zeghbib, S.; Leiner, K.; Jani, A. S. M. R.; Sarder, M. J. U.; Islam, M. H.; Debnath, N. C.; Uelmen, J. A.; Banyai, K.; Kemenesi, G.; Chowdhury, S.
Show abstract
Background: Mobile laboratory diagnostic technologies for Nipah virus outbreak prevention, mitigation and response remain limited, despite the critical need for such capacities in remote, low-resource regions where most cases occur. We aim to address this gap by implementing a workflow that includes method development, laboratory validation, and field demonstration of a mobile Nipah virus complex diagnostic solution. Methods: We developed a flexible mobile laboratory workflow incorporating PCR capacity, a novel amplicon-based sequencing protocol, and a validated Nipah virus inactivation procedure. Following development and validation, we demonstrated the feasibility of this workflow through repeated field sampling of bat colonies in Nipah virus endemic regions of Bangladesh across multiple field campaigns. Findings: We demonstrated the feasibility of this system for early outbreak response and as a potential early warning tool prior to the emergence of human cases. We detected two urine samples from flying foxes that tested positive and performed full-scale on-site analysis, including qPCR diagnostics and NGS sequencing, within 24 hours. Interpretation: As highlighted in the present study, active surveillance enables outbreak prevention by identifying bat colonies that are actively shedding viruses in real time, even in rural settings. Also, this method can provide rapid, on-site sequence data to track and better understand the genomic diversity of Nipah virus in natural reservoirs during both outbreak and non-outbreak periods. In this study we aimed to establish the foundations of a standard procedure for safe and rapid field testing of Nipah virus in remote areas.
Mauvisseau, Q.; Ewer, I.; Blumeris, I.; Iren Bongo, S.; Filipe Brito de Oliveira, L.; Gouvea, B.; Carolina Cei, A.; Ferreira Rodrigues, K.; de Arruda Francisco, J.; Sletteng Garvang, E.; Marena do Rego Henriques, V.; Hurtado Solano, S.; Kvalheim, L.; Kaylynne Lawrence, S.; Ramalho Maciel, B.; Isanda Masaki, H.; Fortunate Mashaphu, M.; Masimula, L.; Prudent Mokgokong, S.; Katrin Onshuus, E.; Lima Paiva, B.; Parker-Allie, F.; Du Plessis, M.; Puzicha, M.; Gabriel Da Silva Solano Reis, O.; Speelman, G.; Moritz Splitthof, W.; Stocco de Lima, A. C.; Strindberg, H.; Smoge Saevik, O.; Tafjord, N. J. D
Show abstract
Environmental DNA metabarcoding is a powerful monitoring tool for assessing aquatic biodiversity, as well as the sustainability and impacts of fisheries and aquaculture. However, conventional laboratory workflows remain time-consuming and dependent on dedicated infrastructures. Here, we present a field trial of a fully portable, off-grid eDNA metabarcoding pipeline that enables end-to-end analysis within a few days using compact equipment, including a BentoLab workstation and an Oxford Nanopore Technologies (ONT) MinION sequencer. The workflow was implemented during two international training courses in Norway and Brazil, where students and early career researchers collected environmental samples, extracted and amplified DNA, prepared DNA libraries, and sequenced on-site before performing bioinformatics and statistical analyses. In the case study detailed here, seven eDNA samples collected and analysed on-site in the Oslofjord allowed detection of 16 fish and elasmobranch species. Although overall diversity was lower than in earlier studies using Illumina-based sequencing, our protocol reliably detected key species and demonstrates that portable eDNA metabarcoding is feasible for rapid ecological assessment, surveillance of high-risk regions and/or deployment in remote or resourcelZllimited settings.
Rojo-Bartolome, I.; Ibanez, J.; Cancio, I.; Ortiz-Zarragoitia, M.; Bilbao, E.
Show abstract
Transcriptomic analyses are widely used to elucidate the molecular mechanisms driving gametogenesis and reproduction in fish, yet their accuracy depends heavily on appropriate normalization of gene expression data. Conventional approaches that rely on single or multiple reference genes are problematic during teleost oogenesis, as profound structural and physiological remodeling of the ovary challenges the assumption that commonly used reference transcripts remain stable. In this study, we assessed by qPCR the transcriptional variability of four widely used reference genes (actb, ef-1, gapdh, and 18S rRNA) throughout the oogenic cycle of the thicklip grey mullet (Chelon labrosus), using geNorm and NormFinder analyses, and we additionally evaluated total cDNA concentration as an alternative normalization factor. To examine the performance and interpretive consequences of each normalization strategy, we compared expression patterns of key steroidogenic genes (star, cyp19a1a, and cyp11b) normalized by individual reference genes, combinations of reference genes, or total cDNA concentration. All evaluated reference genes displayed notable transcriptional variability across oogenesis, confirming their limited suitability as sole internal controls. In contrast, normalization approaches integrating multiple reference genes and/or total cDNA concentration consistently provided greater stability and more reliable biological interpretation. These results support a refined and more robust normalization framework for transcriptional analyses in fish ovaries, particularly during stages of extensive tissue remodeling. Our findings demonstrate cDNA-based normalization is straightforward, rapid, and easy to implement across laboratories, providing a practical alternative for achieving accurate, reproducible transcript quantification in fish ovary studies.
Pulscher, L. A.; Charley, P. A.; Zhan, S.; Reasoner, C.; Burke, B.; Schountz, T.
Show abstract
Bats are exposed to a variety of pollutants, including cadmium (Cd), that can impair immune function and potentially increase viral shedding and burden. Despite this, little is known about the impacts of heavy metals on bats. This study aimed to determine the impacts of Cd exposure on bat T and B cell immune responses in naive and coronavirus infected bats and determine the impact of Cd on viral replication in Jamaican fruit bat (JFB; Artibeus jamaicensis) cells. To determine the impact of Cd exposure on adaptive immune responses, splenocyte cultures from naive and BANAL-52 coronavirus infected JFB were treated with 0, 1, and 10 {micro}M Cd and stimulated overnight with concanavalin A. RNA was extracted, a SYBR Green qPCR was used to assess gene expression. To determine if Cd exposure increased viral replication, two JFB kidney cell clones were treated with 0, 1, 10, and 50 {micro}M of CdCl2 overnight and then infected with Cedar virus (CedV). Supernatants were collected and viral titers determined. Several transcripts were upregulated in both naive and virus infected JFB splenocytes treated with Cd. B cell transcripts were significantly upregulated in a dose-dependent manner and T cell transcripts were also increased in Cd treated splenocytes. Assessment of transcripts associated with T cell subsets suggest a predominant Th2 response in Cd treated splenocytes. Viral replication was not significantly different in Cd treated kidney clones compared to the non-treated cells. These studies provide evidence that JFB adaptive immune responses are altered when exposed to low Cd concentrations.
Dai, H.-J.; Fang, L.-C.; Mir, T. H.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.
Show abstract
Objectives Publicly available datasets dedicated to clinical speech deidentification tasks remain scarce due to privacy constraints and the complexity of speech-level annotation. To address this gap, we compiled the SREDH-AICup sensitive health information (SHI) speech corpus, a time-aligned clinical speech dataset annotated across 38 SHI categories. Methods Two publicly available English medical-domain datasets were adapted to support speech-level de-identification, including script reformulation and controlled re-recorded by 25 participants. Additional Mandarin Chinese clinical-style materials were incorporated to extend linguistic coverage. All audio data were annotated with million-level, time-aligned SHI spans using Label Studio. Inter-annotator agreement was evaluated using Cohen's kappa, following iterative calibration rounds. The resulting corpus supports both automatic speech recognition (ASR) and speech-level recognition of SHIs. Results The final dataset comprises 20 hours of annotated audio, divided into training (10 hours, 1,539 files), validation (5 hours, 775 files), and test (5 hours, 710 files) subsets, totalling 7,830 SHI entities. The language distribution reflects the composition of the selected source materials, with 19.36 hours of English and 0.89 hours of Mandarin Chinese speech. Discussion The corpus exhibits a long-tail distribution consistent with clinical documentation patterns and highlights the limited availability of Chinese medical speech resources. These characteristics underscore both the realism of the dataset and structural challenges associated with multilingual speech de-identification. Conclusion The SREDH-AICup SHI speech corpus provides a clinically grounded, time-aligned speech dataset supporting automated medical speech de-identification research and facilitating future development of multilingual speech-based privacy protection systems.
KAMUANYA, N. C.; LOKOMBA, V. B.; MIKOBI, E. K. B.; MIKOBI, H. T. M.; LUKUSA, P. T.; Mikobi, T. M.
Show abstract
Sickle cell disease (SCD) is the most common inherited hemoglobinopathy worldwide. Improving the quality of life of people with SCD requires prenatal and neonatal screening. Our primary objective was to demonstrate that prenatal diagnosis of SCD is possible even in situations of poverty. Secondarily, we described the socioeconomic profile of couples seeking molecular diagnosis of SCD in Kinshasa, Democratic Republic of Congo. Methods This was a cross-sectional study conducted in Kinshasa between January 2020 and December 2025. During this study period, 107 couples underwent prenatal diagnosis. Prenatal diagnosis was performed using amniocentesis with FTA Elute technology. This diagnosis was confirmed at birth using cord blood DNA extracted via the conventional salting-out technique. Results The mean age of the pregnant women was 28 {+/-} 4 years. Eighty-one couples (75.7%) were Christian, nine couples (8.4%) were Muslim, and seventeen couples (15.8%) were animist. Eighty-two couples (76.6%) were known heterozygous AS couples, eleven (10.2%) were heterozygous couples, and fourteen (13.0%) were couples composed of one homozygous SS and one heterozygous AS partner. All pregnancies were singleton. Socioeconomic status was upper middle class (39.2%). The AS genotype was found in 79% of the fetuses. One intrauterine fetal death was observed after amniocentesis. In terms of handling, the FTA Elute technology reduces DNA extraction time to 30 minutes. It is easy to use. Results are available in less than 24 hours. Conclusion The FTA Elute technology is a reliable, less expensive, and easy-to-use prenatal screening technique for sickle cell disease. Sample transport and storage conditions are better suited to resource-limited settings.
Franziscus, C. A.; Ferrand, A.; Biehlmaier, O.; Schmidt, A.; Spang, A.
Show abstract
Cells contain different organelles and compartments that are essential for cellular function and life. These organelles and compartments need to communicate to assess cellular state in a changing environment, adapt to the new situation, and also to ensure functionality and homeostasis. Moreover, organization and communication differ between cell types. However, our knowledge about these changes is still rather scarce. Subcellular spatial proteomics aims to fill this knowledge gap. While proximity labeling techniques represent a great advance, they do not provide precise spatial resolution. To overcome this limitation, we developed SPEx (Subcellular spatial Proteomics coupled to Expansion), in which we first expand cells about 10- fold, laser micro-dissect regions of interests and then perform mass spectrometry-based proteomics on these samples. We demonstrate the effectiveness of SPEx by determining the proteome of the Golgi, the nucleus and nucleoli. Satisfyingly, we also identify novel components of these organelles. Combining inexpensive already existing technologies makes SPEx readily usable by the wider scientific community.
Abdelhakim, M.; Althagafi, A.; SCHOFIELD, P.; Hoehndorf, R.
Show abstract
Genotype-phenotype databases are essential for variant interpretation and disease gene discovery. Genetic variation differs among human populations, mainly in allele frequencies and haplotype patterns shaped by ancestry and demographic history. Population-specific genotypes can influence traits and disease risk; this makes population specific characterization important. Most existing resources focus on the characterization of a population's genetic background, but do not represent the resulting phenotypes. We have developed PAVS (Phenotype-Associated Variants in Saudi Arabia), a curated, publicly accessible database that integrates 5,132 Saudi clinical cases from four Saudi cohorts and 522 cases from analysis of a mixed-population cohort, together with 1,856 cases from the Deciphering Developmental Disorders study (DDD) and 9,588 literature phenopackets. Each case record describes patient-level phenotypes, encoded with the Human Phenotype Ontology (HPO), and links them to genomic variants, gene identifiers, zygosity, pathogenicity classifications, and disease diagnoses mapped to standardized disease terminologies. The data is represented in Phenopackets format and as a knowledge graph in RDF. Additionally, a web interface provides phenotype-based similarity search, gene and variant browsers, and an HPO hierarchy explorer. We evaluate the utility of the phenotype annotations for gene prioritization using semantic similarity. While there are clear differences to global literature-curated databases, phenotypes in PAVS can successfully rank the correct gene at high rank (ROCAUC: 0.89). PAVS addresses a gap in population-specific genotype-phenotype resources and provides a benchmark for phenotype-driven variant prioritization in under-represented populations.
Di Blasio, S.; Middlekoop, A.; Molist, F.; Cord-Landwehr, S.; Elrayah, A. A.; Guardabassi, L.; Good, L.; Pelligand, L.
Show abstract
Managing post-weaning diarrhoea (PWD) in piglets is difficult due to limits on antibiotics and zinc. Chitosan is emerging as a potential feed additive. We analysed a chito-oligosaccharide hydrochloride (COS-HCl), a low molecular weight (LMW) chitosan, and a medium molecular weight (MMW) chitosan, and assessed their effects on growth, faecal consistency, microbiota, and potential interference with enterotoxigenic Escherichia coli (ETEC). The three chitosans were characterised using {superscript 1}H-NMR, SEC-RI-MS, and SEC-RI-MALLS. COS-HCl had an Mw of 0.824 kDa; LMW and MMW showed Mw ranges of 14.4 kDa (0.3-30 kDa) and 116 kDa (15-600 kDa). Degrees of acetylation were 9.5%, 6.5%, and 15%. Two 42-day field studies evaluated average daily gain (ADG), faecal consistency, and microbiota. In the first trial, COS-HCl at 0.025-0.1% did not significantly affect ADG (-33 to - 12 g/d). In the second, LMW and MMW at 0.01% did not significantly change ADG (-7 and +3 g/d). Faecal consistency, ETEC shedding, and microbiota composition were similar to controls. An enzymatic HPLC-MS method enabled quantification of MMW chitosan in premix. Our results highlight the importance of advanced chitosan characterisation for precision nutrition and suggest that a threshold dosemay be needed to benefit growth and gut health in PWD management. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=113 SRC="FIGDIR/small/714014v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@19c9e23org.highwire.dtl.DTLVardef@152461aorg.highwire.dtl.DTLVardef@7886e0org.highwire.dtl.DTLVardef@df0d9b_HPS_FORMAT_FIGEXP M_FIG C_FIG
Bauman, A.; Owen, K.; Messing, S.; Macdonald, H.; Nettlefold, L.; Richards, J.; Vandelanotte, C.; Chen, I.-H.; Cullen, B.; van Buskirk, J.; van Itallie, A.; Coletta, G.; O'Halloran, P.; Randle, E.; Nicholson, M.; Staley, K.; McKay, H. A.
Show abstract
Military aviation training noise remains understudied despite its widespread impacts across urban, rural, and wilderness areas. The predominance of low-frequency noise and repetitive training can create pervasive noise pollution, yet past research often fails to capture the full range of health and quality-of-life effects. This study analyzed two complaint datasets related to Whidbey Island Naval Air Station noise: U.S. Navy records (2017-2020) and Quiet Skies Over San Juan County data (2021-2023). We analyzed and mapped sentiment intensity from noise complaints relative to modeled annual noise exposure, developed a typology to classify impacts, and modeled the environmental and operational factors influencing complaints. Findings revealed widespread negative sentiment and anger, often beyond the bounds of estimated noise contours, suggesting that annual cumulative noise models inadequately estimate community impacts. Complaints consistently highlighted sleep disturbance, hearing and health concerns, and compromised home environments due to shaking, vibration, and disruption of daily life. Residents also reported significant social, recreational, and work disruptions, along with feelings of fear, helplessness, and concern for children's well-being. The number of complaints were strongly associated with training schedules, with late-night sessions being the strongest predictor. A delayed response pattern suggests residents reach a frustration threshold before filing complaints. Overall, our findings demonstrate persistent negative sentiment and diverse impacts from military aviation noise. Results highlight the need for improved noise metrics, modeling and operational adjustments to mitigate the most disruptive effects.
McPhaul, T.; Kreimeyer, K.; Baris, A.; Botsis, T.
Show abstract
Cancer data standardization requires converting unstructured pathology reports into structured registry variables, a mostly manual and resource-intensive task. We evaluated two automated extraction platforms: Brim Analytics, an LLM-based system that guides and orchestrates abstraction, and DeepPhe, an ontology-driven system. Using 330 pancreatic adenocarcinoma and 34 breast cancer pathology reports from Johns Hopkins Hospital, we assessed both under deployment-realistic conditions. Brim Analytics achieved high accuracy across seven registry variables in pancreatic cancer (mean 96.7%), including T stage (96.4%) and histologic grade (97.0%), with a 3.0 p.p. decline on breast cancer (mean 93.7%). DeepPhe performed comparably for N stage (96.4% pancreatic, 94.1% breast) but had notable T stage deficits (83.6% pancreatic, 70.6% breast). Per-report processing times averaged 0.9 s (Brim, pancreatic), 4.6 s (Brim, breast), 1.1 s (DeepPhe, pancreatic), and 3.5 s (DeepPhe, breast). These results indicate that LLM-based extraction can achieve high accuracy across cancer types and support automated data workflows.
Mohsini, K.; Gore-Langton, G. R.; Rathod, S. D.; Mansfield, K. E.; Warren-Gash, C.
Show abstract
Aims Indoor air pollution resulting from combustion of unclean cooking fuels has been linked to adverse health outcomes, but evidence regarding its association with mental health in low- and middle-income countries remains limited. We investigated the association between household use of unclean cooking fuels, as a proxy for indoor air pollution, and depression symptoms among adults aged 45 years and older in India, and assessed effect modification by age, sex, caste, and rural/urban residence. Methods We conducted a cross-sectional analysis of the first wave (2017-2018) of data from the Longitudinal Aging Study in India (LASI), a nationally representative survey of adults aged [≥]45 years. Cooking fuel type was classified as clean or unclean, and depression symptoms were assessed using the 10-item Centre for Epidemiologic Studies Depression (CES-D-10) scale. We used logistic regression to estimate odds ratios for depression symptoms, and linear regression to compare mean CES-D-10 scores by cooking fuel type, adjusting for sociodemographic and housing characteristics. Results We included 62,650 respondents. Median age was 57 years (IQR: 50-65), 46.7% were women, 47.6% reported using unclean cooking fuels, and 27.6% screened positive on the CES-D-10. After adjusting for sociodemographic and housing characteristics, use of unclean cooking fuels was associated with higher odds of screening positive on the CES-D-10 (aOR: 1.08; 95% CI: 1.02, 1.15), and higher mean CES-D-10 scores (adjusted mean difference: 0.34; 95% CI: 0.24, 0.44). The association was more pronounced among individuals living in urban areas (aOR: 1.36; 95% CI: 1.21, 1.53). Conclusion Use of unclean cooking fuels was associated with depression symptoms among older adults in India, and especially among those living in urban areas.
Mthiyane, N.; Ndlovu, S.; Kiragga, A.; Tasner, F.; Bunker, A.; Cumbe, V.; Ramiro, I.; Odero, H.; Omondi, E.; Liyanage, P.; Lindner, E.; Traore, N.; Sie, A.; Barnighausen, T.; Otieno, F.; Wambua, G. N.; Akinyi, L. J.; Khagayi, S.; Mulopo, C.; Wekesah, F. M.; Treffry-Goatley, A.; Black, G. F.; Iwuji, C.
Show abstract
Background: Extreme weather events (EWEs) are increasing in frequency and intensity due to climate change. EWEs negatively affect both physical and mental health, with vulnerable populations disproportionately impacted. Limited data on the specific effects of EWEs on mental health in Africa highlights the need for more research to guide policy and practice. The WEMA study aims to explore the impact of EWEs, in particular storms, cyclones, flooding, and heavy rainfall on common mental disorders (CMDs) in Burkina Faso, Kenya, Mozambique, and South Africa. Methods: This study will employ a transdisciplinary research approach integrating qualitative and quantitative methods to generate contextually grounded and policy-relevant evidence on the mental health impacts of EWEs in sub-Saharan Africa (SSA). We begin with Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guided rapid literature review to synthesise existing evidence on the relationship between EWEs and mental health. Secondary analysis of health and demographic surveillance system (HDSS) data across multiple African sites will assess the temporal association between temperature, precipitation, and mental health-related morbidity and mortality, using time series regression with distributed lag non-linear models. In parallel, cross-sectional surveys will estimate the prevalence of CMDs among adults exposed and unexposed to flooding. Logistic regression, accounting for confounders, will be used to estimate odd ratios of the impact of flooding on CMDs. An embedded qualitative study will involve thematic analysis of digital stories produced by community-based co-researchers through participatory workshops, capturing lived experiences of EWEs. Findings from both components will be synthesised and disseminated through knowledge exchange meetings to bridge scientific and experiential insights and inform locally relevant interventions. Discussion: The pool of evidence generated through this transdisciplinary study will be widely shared to draw attention to the impact of EWEs on mental health and to inform relevant policy and practice. Through this work, we aim to advance locally relevant climate adaptation strategies to help reduce health inequalities and support the psychosocial well-being of affected communities.
Navaratnam, A. M. D.; Bishop, T. R. P.; Tatah, L.; Williams, H.; Spadaro, J. V.; Khreis, H.
Show abstract
Background Ambient air pollution is a leading global health risk and disproportionately affects populations of Low- and Middle-Income Countries (LMICs). In 2021, WHO revised its Air Quality Guidelines (AQG), lowering recommended annual limits for Particulate Matter 2.5 (PM2.5) and Nitrogen Dioxide (NO2). We estimated the potential health and economic impacts of achieving WHO Interim Target 3 (IT3) and AQG concentrations across LMICs. Methods We conducted a health impact assessment across 136 LMICs to quantify one-year changes in all-cause and cause-specific mortality (chronic obstructive pulmonary disease [COPD], ischaemic heart disease [IHD], and stroke) and disease incidence (COPD, dementia, IHD, and stroke) under WHO IT3 and AQG counterfactual scenarios for PM2.5 and NO2. Concentration-response functions were applied at 1km x 1km resolution. Economic welfare impacts of mortality risk reductions were estimated using country-adjusted values of a statistical life (VSL, Int$ PPP-adjusted 2021). Direct medical and productivity-related costs associated with incident cases were estimated using a cost-of-illness (COI) framework. Uncertainty intervals (UI) reflect uncertainty in concentration-response functions. Results Attainment of WHO IT3 and AQG concentrations for PM2.5 was associated with an estimated 16.04% reduction (6.58million, UI: 6.10-7.07million) and 22.97% reduction (9.43million, UI: 8.75-10.11million) in annual deaths, respectively. Corresponding VSL-based estimates of deaths averted were Int$5.5 trillion (7.0% of aggregate LMIC GDP) and Int$8.4 trillion (10.6% of GDP), respectively. For NO2, IT3 and AQG scenarios were associated with estimated reductions of approximately 1.06% (approximately 435,000 deaths, UI: 388,000-483,000) and 2.79% (435,000 deaths; UI: 388,000-483,000), yielding gains of Int$0.6 trillion (0.7% of GDP) and Int$1.5 trillion (1.9% of GDP). Disease-specific mortality reductions were most prominent for IHD and stroke in Asia and Africa. Under the PM2.5 AQG scenario, an estimated 2.82million (1.67-2.97) COPD, 1.10million (0.83-1.37) dementia, 7.3million (6.41-8.19) IHD, and 2.3million (2.19-2.41) stroke cases could be delayed or averted in one year. Associated reductions in direct medical and productivity-related costs were greatest for IHD, COPD, and stroke. NO2-related morbidity reductions were smaller across all outcomes. All estimates represent one-year changes in risk relative to counterfactual exposure and may reflect delayed rather than permanently avoided events. Discussion Achieving both WHO IT3 and AQG values in LMICs could yield substantial reductions in premature mortality and disease incidence, particularly for cardiovascular and respiratory conditions, alongside large, monetised welfare gains from reduced mortality risk. These findings underscore the considerable societal value of air quality improvements and support accelerated action toward meeting WHO guideline levels in regions bearing the highest pollution burden.
Hou, Y.; Cohen, E.; Higginbottom, J.; Rountree, L.; Ren, Y.; Wahl, B.; Nyhan, K.; Mukherjee, B.
Show abstract
India's national research capacity and infrastructure are unevenly distributed across states and union territories (UTs), contributing to geographic variation in academic publication output. We developed Indiapub, an open-access web application that quantitatively enumerates and visually displays geographic and temporal publication patterns for research products with at least one author affiliated with an Indian institution, using OpenAlex data. The app is designed for ease of use, with automated data retrieval, cleaning, and aggregation. Indiapub allows users to filter publications by topic, publication year range, author position, publication type, minimum citation count, state/UT, and population size of the state/UT where the author institution is located. The app also provides downloadable tables and ranked institution lists by publication count. Its interactive dashboard includes five modules: (i) a map of publication distribution, (ii) time trend plots for nation and state/UT, (iii) publication-share versus population-share plots highlighting over- and underrepresentation, (iv) stacked bar charts of state/UT contributions over time with population benchmarks, and (v) bubble plots relating the Human Development Index to publication volume over time. This tool may support resource prioritization and identification of institutional strengths for trainees, researchers, higher education administrators, and policymakers. To illustrate its utility, we present sample findings derived from the app. For publications across all topics from 2014 to 2025, the largest research participation footprints were observed in Tamil Nadu, Maharashtra, Delhi, Uttar Pradesh, and Karnataka. Tamil Nadu and Delhi were home to three of the highest-publishing institutions nationally: Vellore Institute of Technology, All India Institute of Medical Sciences, and Indian Institute of Technology Delhi. We also examined six curated case studies of broad scientific interest: electronic health records (EHR), genome-wide association studies (GWAS), artificial intelligence (AI), development economics, environmental science, and COVID-19. Findings from these case studies revealed over- and underrepresentation in publication output across states and UTs. For example, in EHR publications among high-population states, Tamil Nadu's publication share exceeded its population share by 31.3 percentage points (pp), whereas Bihar's was 12.8 pp lower. Our tool offers insights into India's research landscape across states and UTs with easy-to-digest visuals. Such interactive tools have the potential to serve as a starting point for fostering a more inclusive research ecosystem supporting targeted research policy and planning.